A Recognising multi-digit numbers in photographs captured at street level is an important component of modern-day map making. A classic example of a corpus of such street-level photographs is Google’s Street View imagery composed of hundreds of millions of geo-located 360-degree panoramic images.
The ability to automatically transcribe an address number from a geo-located patch of pixels and associate the transcribed number with a known street address helps pinpoint, with a high degree of accuracy, the location of the building it represents. More broadly, recognising numbers in photographs is a problem of interest to the optical character recognition community.
While OCR on constrained domains like document processing is well studied, arbitrary multi-character text recognition in photographs is still highly challenging. This difficulty arises due to the wide variability in the visual appearance of text in the wild on account of a large range of fonts, colours, styles, orientations, and character arrangements.
The recognition problem is further complicated by environmental factors such as lighting, shadows, specularity, and occlusions as well as by image acquisition factors such as resolution, motion, and focus blurs. In this project, we will use the dataset with images centred around a single digit (many of the images do contain some distractors at the sides). Although we are taking a sample of the data which is simpler, it is more complex than MNIST because of the distractors.
We will build a digit classifier on the SVHN (Street View Housing Number) dataset.
Liveplot.import h5py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.style as style; style.use('fivethirtyeight')
from IPython.display import clear_output,Markdown
%matplotlib inline
# plt.rcParams["figure.figsize"] = (15,10)
pd.set_option('display.max_colwidth', 500)
from sklearn.metrics import roc_auc_score, mean_absolute_error, r2_score
from sklearn.preprocessing import StandardScaler,MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, accuracy_score, confusion_matrix, precision_score, recall_score, f1_score, precision_recall_curve, auc
import sys
from itertools import permutations
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense, BatchNormalization, Activation, Dropout, Cropping2D
from tensorflow.keras.models import Model, Sequential
from tensorflow.keras.initializers import RandomNormal,Constant
from tensorflow.keras.utils import to_categorical
import optuna
dataset = h5py.File('Part - 4 - Autonomous_Vehicles_SVHN_single_grey1.h5','r')
dset_keys = dataset.keys()
display(Markdown("#### Dataset keys are:"))
print(f"{[i for i in dset_keys]}")
['X_test', 'X_train', 'X_val', 'y_test', 'y_train', 'y_val']
X_test = dataset['X_test'][:]
X_train = dataset['X_train'][:]
X_val = dataset['X_val'][:]
y_test = dataset['y_test'][:]
y_train = dataset['y_train'][:]
y_val = dataset['y_val'][:]
display(Markdown("#### Dataset shapes are:"))
X_test.shape, X_train.shape , X_val.shape, y_test.shape, y_train.shape, y_val.shape
((18000, 32, 32), (42000, 32, 32), (60000, 32, 32), (18000,), (42000,), (60000,))
display(Markdown("#### Unique Labels"))
print('Unique labels in y_train:', np.unique(y_train))
print('Unique labels in y_val:', np.unique(y_val))
print('Unique labels in y_test:', np.unique(y_test))
Unique labels in y_train: [0 1 2 3 4 5 6 7 8 9] Unique labels in y_val: [0 1 2 3 4 5 6 7 8 9] Unique labels in y_test: [0 1 2 3 4 5 6 7 8 9]
# Visualizing first 10 images in the dataset and their labels
plt.figure(figsize = (15, 4.5))
for i in range(10):
plt.subplot(1, 10, i+1)
plt.imshow(X_train[i].reshape((32, 32)),cmap = plt.cm.binary)
plt.axis('off')
plt.subplots_adjust(wspace = -0.1, hspace = -0.1)
plt.show()
print('Label for each of the above image: %s' % (y_train[0 : 10]))
Label for each of the above image: [2 6 7 4 4 0 3 0 7 3]
display(Markdown(f"#### First Image and Label in the training set -- `{y_train[0]}` \n\n"))
plt.imshow(X_train[0], cmap = plt.cm.binary)
plt.show()
2¶display(Markdown(f"#### Checking first image and label in validation set -- `{y_val[0]}` \n\n"))
plt.imshow(X_val[0], cmap = plt.cm.binary)
plt.show()
0¶display(Markdown(f"#### Checking first image and label in test set -- `{y_test[0]}`\n\n"))
plt.imshow(X_test[0], cmap = plt.cm.binary)
plt.show()
1¶class DNN_Model() :
def __init__(self,type='regression'):
self.type = type
def create_model(self,in_shape,outs,**kwargs):
self.in_shape = in_shape
if kwargs != {}:
self.num_layers = kwargs['num_layers'] if 'num_layers' in kwargs else 1
self.batch_size = kwargs['batch_size'] if 'batch_size' in kwargs else 10
self.activation = kwargs['activation'] if 'activation' in kwargs else 'relu'
self.kernel_init = kwargs['kernel_init'] if 'kernel_init' in kwargs else 'uniform'
# self.optimizer = kwargs['opts'] if 'opts' in kwargs else tf.keras.optimizers.Adam(learning_rate=0.001)
self.neurons_per_layer = kwargs['neurons_per_layer'] if 'neurons_per_layer' in kwargs else [64]*self.num_layers
self.drop_rate_per_layer = kwargs['drop_rate_per_layer'] if 'drop_rate_per_layer' in kwargs else [0.2]*self.num_layers
self.n_epochs = kwargs['n_epochs'] if 'n_epochs' in kwargs else 50
self.arch = kwargs['arch'] if 'arch' in kwargs else 'B A D'
self.momentum = kwargs['momentum'] if 'meomentum' in kwargs else 0.99
self.epsilon = kwargs['epsilon'] if 'epsilon' in kwargs else 0.005
self.beta_init_std = kwargs['beta_init_std'] if 'beta_init_std' in kwargs else 0.05
self.gamma_init = kwargs['gamma_init'] if 'gamma_init' in kwargs else 0.9
self.center = kwargs['center'] if 'center' in kwargs else True
self.scale = kwargs['scale'] if 'scale' in kwargs else False
if 'opts' in kwargs:
# check if the kwargs is a class object or a dictionary
if hasattr(kwargs['opts'],'_name'):
self.optimizer = kwargs['opts']
else:
opt_conf = kwargs['opts']
use_opt = opt_conf['optimizer']
opt_kwargs = {k:opt_conf[k] for k in opt_conf.keys() if k != 'optimizer'}
try:
self.optimizer = getattr(tf.optimizers, opt_conf['optimizer'])(**opt_kwargs)
except:
self.optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
else:
self.optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
else:
self.num_layers = 1
self.batch_size = 10
self.activation = 'relu'
self.kernel_init = 'normal'
self.optimizer = tf.keras.optimizers.Adam(learning_rate=0.001)
self.neurons_per_layer = [32] * self.num_layers
self.drop_rate_per_layer = [0.2] * self.num_layers
self.n_epochs = 50
self.arch = 'B A D'
self.momentum = 0.99
self.epsilon = 0.005
self.beta_init_std = 0.05
self.gamma_init = 0.9
self.center = True
self.scale = False
if self.type == 'regression':
self.op_activation = 'linear'
self.outs = 1
elif self.type == 'classification':
self.op_activation = 'softmax'
self.outs = outs
inputs = Input(shape=self.in_shape)
for layer in range(self.num_layers):
if layer == 0 :
m = Dense(self.neurons_per_layer[layer],kernel_initializer = self.kernel_init)(inputs)
else:
m = Dense(self.neurons_per_layer[layer],kernel_initializer = self.kernel_init)(m)
for keys in self.arch.split(' '):
if keys == 'A':
m = Activation(self.activation)(m)
elif keys == 'B':
m = BatchNormalization(momentum = self.momentum,
epsilon = self.epsilon,
beta_initializer = RandomNormal(mean=0.0, stddev=self.beta_init_std),
gamma_initializer = Constant(value=self.gamma_init),
center = self.center,
scale = self.scale)(m)
elif keys == 'D':
m = Dropout(self.drop_rate_per_layer[layer])(m)
outputs = Dense(self.outs, activation=self.op_activation,kernel_initializer = self.kernel_init)(m)
model = Model(inputs=inputs, outputs=outputs)
return(model)
def model_summary(self,model):
return model.summary()
def train_model(self,model,X_train,y_train,loss,metric,callbacks=None,validation_split=0.2,validation_data=None,verbose=1):
self.model = model
self.loss = loss
self.metric = metric
self.model.compile(loss=self.loss,
optimizer=self.optimizer,
metrics=self.metric)
if validation_data == None:
if callbacks != None:
callbacks = []
else :
callbacks = callbacks
self.history = self.model.fit(X_train,
y_train,
epochs=self.n_epochs,
validation_split=validation_split,
callbacks=callbacks,
verbose=0)
else:
self.history = self.model.fit(X_train,
y_train,
epochs=self.n_epochs,
validation_data=validation_data,
callbacks=callbacks,
verbose=0)
return
def test_model(self,X_test,y_test,callbacks=None):
self.test_results = self.model.evaluate(X_test,y_test,batch_size=self.batch_size,callbacks=callbacks,verbose=0)
return self.test_results
class LivePlot(tf.keras.callbacks.Callback):
def __init__(self,refresh_rate=5,train_loss=None,train_metric=None):
self.validation_prefix = "val_"
self.refresh_rate = refresh_rate
self.train_loss = train_loss
self.val_loss = self.validation_prefix + train_loss
self.train_metric = train_metric
self.val_metric = self.validation_prefix + train_metric
# This function is called when the training begins
def on_train_begin(self, logs={}):
# Initialize the lists for holding the logs, losses and metrics
self.train_losses = []
self.train_metrics = []
self.val_losses = []
self.val_metrics = []
self.logs = []
# This function is called at the end of each epoch
def on_epoch_end(self, epoch, logs={}):
"""
Calculates and plots loss and metrics
"""
# Extract from the log
log_train_loss = logs.get(self.train_loss)
log_train_metric = logs.get(self.train_metric)
log_val_loss = logs.get(self.val_loss)
log_val_metric = logs.get(self.val_metric)
# Append the logs, losses and accuracies to the lists
self.logs.append(logs)
self.train_losses.append(log_train_loss)
self.train_metrics.append(log_train_metric)
self.val_losses.append(log_val_loss)
self.val_metrics.append(log_val_metric)
# Plots every 5th epoch
if epoch > 0 and epoch%self.refresh_rate == 0:
fig, ax = plt.subplots(1,2,figsize=(20,6))
clear_output(wait=True)
N = np.arange(0, len(self.train_losses))
sns.lineplot(x=N,y=self.train_losses,ax=ax[0],legend='brief',label=self.train_loss)
sns.lineplot(x=N, y = self.val_losses,ax=ax[0],legend='brief',label=self.val_loss)
ax[0].set_title(f'Loss over Epoch {epoch} - {log_train_loss:.3}/{log_val_loss:.3} - (T/V)\n',{'fontsize':20})
ax[0].set_xlabel('Epochs')
ax[0].set_ylabel(self.train_loss)
sns.lineplot(x=N,y=self.train_metrics,ax=ax[1],legend='brief',label=self.train_metric)
sns.lineplot(x=N,y=self.val_metrics,ax=ax[1],legend='brief',label=self.val_metric)
ax[1].set_title(f'Performance over Epoch {epoch} - {log_train_metric:.3}/{log_val_metric:.3} - (T/V) \n',{'fontsize':20})
ax[1].set_xlabel('Epochs')
ax[1].set_ylabel(self.train_metric)
plt.show()
32 X 32 image with a baseline model and measure it's performance.22 X 22 image and use that to build a base line model and performance score.categorical_cross_entropy as the loss function and accuracy as the performance measure throught the excercise.Xo_train, Xo_test, Xo_val = X_train, X_test, X_val
yo_train, yo_test, yo_val = y_train, y_test, y_val
display(Markdown(f"#### Reshaping X data: (n, 32, 32) => (n, 1024)"))
Xo_train = Xo_train.reshape((Xo_train.shape[0], -1))
Xo_val = Xo_val.reshape((Xo_val.shape[0], -1))
Xo_test = Xo_test.reshape((Xo_test.shape[0], -1))
display(Markdown(f"#### Making sure that the values are float so that we can get decimal points after division"))
Xo_train = Xo_train.astype('float32')
Xo_val = Xo_val.astype('float32')
Xo_test = Xo_test.astype('float32')
display(Markdown(f"#### Normalizing the RGB codes by dividing it to the max RGB value"))
Xo_train /= 255
Xo_val /= 255
Xo_test /= 255
display(Markdown(f"#### Converting y data into categorical (one-hot encoding)"))
yo_train = to_categorical(yo_train)
yo_val = to_categorical(yo_val)
yo_test = to_categorical(yo_test)
display(Markdown("#### Display Shapes"))
print('Xo_train shape:', Xo_train.shape)
print('Xo_val shape:', Xo_val.shape)
print('Xo_test shape:', Xo_test.shape)
print('\n')
print('yo_train shape:', yo_train.shape)
print('yo_val shape:', yo_val.shape)
print('yo_test shape:', yo_test.shape)
print('\n')
print('Number of images in X_train', Xo_train.shape[0])
print('Number of images in X_val', Xo_val.shape[0])
print('Number of images in X_test', Xo_test.shape[0])
Xo_train shape: (42000, 1024) Xo_val shape: (60000, 1024) Xo_test shape: (18000, 1024) yo_train shape: (42000, 10) yo_val shape: (60000, 10) yo_test shape: (18000, 10) Number of images in X_train 42000 Number of images in X_val 60000 Number of images in X_test 18000
# live plot refresh rate
refresh_rate = 2
# Model train/test score board
score_board = pd.DataFrame(columns=['Baseline','Loss','Metric','Notes'])
## Baseline Params
kwargs = {'num_layers': 2,
'arch' :'B A D',
'neurons_per_layer':[128,64],
'batch_size': 200,
'drop_rate_per_layer': [0.1,0.1],
'activation': 'sigmoid',
'n_epochs': 100,
'kernel_init': 'he_normal',
# 'lr': 0.001,
# 'opts': 'adam'
}
# output classes
n_class = len(np.unique(y_train))
# Loss and perf. metric being considered
loss = 'categorical_crossentropy'
metric = 'accuracy'
# Baseline Model
dnn_1 = DNN_Model('classification')
m_1 = dnn_1.create_model(Xo_train.shape[1],n_class,**kwargs)
display(Markdown("### Base Model Summary"))
m_1.summary()
Model: "model_10" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_11 (InputLayer) [(None, 1024)] 0 _________________________________________________________________ dense_30 (Dense) (None, 128) 131200 _________________________________________________________________ batch_normalization_20 (Batc (None, 128) 384 _________________________________________________________________ activation_20 (Activation) (None, 128) 0 _________________________________________________________________ dropout_20 (Dropout) (None, 128) 0 _________________________________________________________________ dense_31 (Dense) (None, 64) 8256 _________________________________________________________________ batch_normalization_21 (Batc (None, 64) 192 _________________________________________________________________ activation_21 (Activation) (None, 64) 0 _________________________________________________________________ dropout_21 (Dropout) (None, 64) 0 _________________________________________________________________ dense_32 (Dense) (None, 10) 650 ================================================================= Total params: 140,682 Trainable params: 140,298 Non-trainable params: 384 _________________________________________________________________
dnn_1 = DNN_Model('classification')
m_1 = dnn_1.create_model(Xo_train.shape[1],n_class,**kwargs)
callbacks = LivePlot(2,train_loss='loss',train_metric='accuracy')
dnn_1.train_model(m_1,Xo_train,yo_train,[loss],[metric],callbacks=[callbacks], validation_data = (Xo_val,yo_val))
result_1 = dnn_1.test_model(Xo_test,yo_test)
display(Markdown(f"### Test Loss : {result_1[0]} Test Accuracy : {result_1[1]}"))
score_board = score_board.append({'Baseline':'2 Layer - 32 X 32',
'Loss':result_1[0],
'Metric':result_1[1],
'Notes':'Loss: Categorical entropy -- Metric : Accuracy'},ignore_index=True)
score_board
| Baseline | Loss | Metric | Notes | |
|---|---|---|---|---|
| 0 | 2 Layer - 32 X 32 | 0.853956 | 0.739278 | Loss: Categorical entropy -- Metric : Accuracy |
0.73928 or almost 74% as the baseline score with the 32 X 32 image.input_shape = (X_train.shape[1],X_train.shape[2],1)
input_shape
(32, 32, 1)
model_crop = Sequential()
model_crop.add(Cropping2D(cropping=((5, 5), (5, 5)), input_shape=input_shape))
model_crop.summary()
Model: "sequential_6" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= cropping2d_3 (Cropping2D) (None, 22, 22, 1) 0 ================================================================= Total params: 0 Trainable params: 0 Non-trainable params: 0 _________________________________________________________________
input_train = X_train.reshape(X_train.shape[0], X_train.shape[1],X_train.shape[2],1)
input_test = X_test.reshape(X_test.shape[0], X_test.shape[1],X_test.shape[2],1)
input_val = X_val.reshape(X_val.shape[0], X_val.shape[1],X_val.shape[2],1)
input_train.shape, input_test.shape, input_val.shape
((42000, 32, 32, 1), (18000, 32, 32, 1), (60000, 32, 32, 1))
temp_img = model_crop.predict(input_train)
crop_img_width, crop_img_height = temp_img.shape[1], temp_img.shape[2]
crop_img_width, crop_img_height
(22, 22)
output_train = model.predict(input_train)
output_test = model.predict(input_test)
output_val = model.predict(input_val)
output_train.shape, output_test.shape, output_val.shape
((42000, 22, 22, 1), (18000, 22, 22, 1), (60000, 22, 22, 1))
Xc_train= output_train.reshape(42000,22, 22)
Xc_test= output_test.reshape(18000,22, 22)
Xc_val= output_val.reshape(60000,22, 22)
Xc_train.shape, Xc_test.shape, Xc_val.shape
((42000, 22, 22), (18000, 22, 22), (60000, 22, 22))
fig, axes = plt.subplots(1, 2)
print("Label: {}".format(y_train[1]))
axes[0].imshow(X_train[1])
axes[0].set_title('Original image')
axes[1].imshow(Xc_train[1])
axes[1].set_title('Cropped image')
fig.set_size_inches(9, 5, forward=True)
plt.show()
Label: 6
Xc_train.reshape((Xc_train.shape[0], -1)).shape
(42000, 484)
yc_train, yc_test, yc_val = y_train, y_test, y_val
display(Markdown(f"#### Reshaping X data: (n, 22, 22) => (n, 484)"))
Xc_train = Xc_train.reshape((Xc_train.shape[0], -1))
Xc_val = Xc_val.reshape((Xc_val.shape[0], -1))
Xc_test = Xc_test.reshape((Xc_test.shape[0], -1))
display(Markdown(f"#### Making sure that the values are float so that we can get decimal points after division"))
Xc_train = Xc_train.astype('float32')
Xc_val = Xc_val.astype('float32')
Xc_test = Xc_test.astype('float32')
display(Markdown(f"#### Normalizing the RGB codes by dividing it to the max RGB value"))
Xc_train /= 255
Xc_val /= 255
Xc_test /= 255
display(Markdown(f"#### Converting y data into categorical (one-hot encoding)"))
yc_train = to_categorical(yc_train)
yc_val = to_categorical(yc_val)
yc_test = to_categorical(yc_test)
display(Markdown("#### Display Shapes"))
print('Xc_train shape:', Xc_train.shape)
print('Xc_val shape:', Xc_val.shape)
print('Xc_test shape:', Xc_test.shape)
print('\n')
print('yc_train shape:', yc_train.shape)
print('yc_val shape:', yc_val.shape)
print('yc_test shape:', yc_test.shape)
print('\n')
print('Number of images in X_train', Xc_train.shape[0])
print('Number of images in X_val', Xc_val.shape[0])
print('Number of images in X_test', Xc_test.shape[0])
Xc_train shape: (42000, 484) Xc_val shape: (60000, 484) Xc_test shape: (18000, 484) yc_train shape: (42000, 10) yc_val shape: (60000, 10) yc_test shape: (18000, 10) Number of images in X_train 42000 Number of images in X_val 60000 Number of images in X_test 18000
dnn_2= DNN_Model('classification')
m_2 = dnn_2.create_model(Xc_train.shape[1],n_class,**kwargs)
callbacks = LivePlot(2,train_loss='loss',train_metric='accuracy')
dnn_2.train_model(m_2,Xc_train,yc_train,[loss],[metric],callbacks=[callbacks], validation_data = (Xc_val,yc_val))
result_2 = dnn_2.test_model(Xc_test,yc_test)
display(Markdown(f"### Test Loss : {result_2[0]} Test Accuracy : {result_2[1]}"))
score_board = score_board.append({'Baseline':'2 Layer - 22 X 22 (cropped)',
'Loss':result_2[0],
'Metric':result_2[1],
'Notes':'Loss: Categorical entropy -- Metric : Accuracy'},ignore_index=True)
score_board
| Baseline | Loss | Metric | Notes | |
|---|---|---|---|---|
| 0 | 2 Layer - 32 X 32 | 0.853956 | 0.739278 | Loss: Categorical entropy -- Metric : Accuracy |
| 1 | 2 Layer - 22 X 22 (cropped) | 0.823071 | 0.776000 | Loss: Categorical entropy -- Metric : Accuracy |
Activation, BatchNormalization and Dropout.At each stage the parameter determined in the previous stage flows into the next one to override defaults
Note:
In the interest of time we have reduced the number of `Trials` or iterations. Increasing the number of `Trials` iterations increases the chances of finding a better set of parameters but does not always guarentee that.
hp_score_board = pd.DataFrame(columns=['Phase','Best Value','Hyperparameters'])
def objective_arch(trial):
# A = Activation
# B = BatchNormalization
# D = Dropout
#['A B D'] = ['Activation','BarchNormalization','Dropout']
global loss,metric,Xc_train, yc_train, Xc_test, yc_test, Xc_val, yc_val
epochs = 100
lr = 0.001
batch_size=200
num_layers = trial.suggest_categorical('num_layers',[1,2,3,4,5,6,7])
arch = trial.suggest_categorical('arch',[' '.join(i) for i in permutations(['A','B','D'],3)])
neurons_per_layer = [trial.suggest_categorical(f'neuron_l{layers}',[10,32,64,128,256,512,1024]) for layers in range(1,num_layers+1)]
opts = 'adam'
if opts == 'sgd':
optimizer = tf.keras.optimizers.SGD(learning_rate=lr)
elif opts == 'adadelta':
optimizer = tf.keras.optimizers.Adadelta(learning_rate=lr)
elif opts == 'adam':
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
elif opts == 'rmsprop':
optimizer = tf.keras.optimizers.RMSprop(learning_rate=lr)
kwargs = {'num_layers': num_layers,
'arch' :arch,
'neurons_per_layer':neurons_per_layer,
'batch_size': batch_size,
'drop_rate_per_layer': [0.1]*num_layers,
'activation': 'relu',
'n_epochs': epochs,
'kernel_init': 'he_normal',
'opts': optimizer}
dnn_1 = DNN_Model('classification')
m_1 = dnn_1.create_model(Xc_train.shape[1],n_class,**kwargs)
callbacks=[]
dnn_1.train_model(m_1,Xc_train,yc_train,[loss],[metric],callbacks=[callbacks], validation_data = (Xc_val,yc_val))
result = dnn_1.test_model(Xc_test,yc_test)
return result[1]
study_arch = optuna.create_study(direction='maximize',sampler=optuna.samplers.TPESampler())
study_arch.optimize(objective_arch,n_trials=10,n_jobs=-11)
[I 2021-03-12 15:29:53,138] A new study created in memory with name: no-name-f5f67461-99b3-4ca6-b630-541f9fde7fdc [I 2021-03-12 15:42:37,428] Trial 4 finished with value: 0.8133888840675354 and parameters: {'num_layers': 2, 'arch': 'B D A', 'neuron_l1': 32, 'neuron_l2': 1024}. Best is trial 4 with value: 0.8133888840675354. [I 2021-03-12 15:44:27,935] Trial 0 finished with value: 0.8334444165229797 and parameters: {'num_layers': 2, 'arch': 'B A D', 'neuron_l1': 512, 'neuron_l2': 128}. Best is trial 0 with value: 0.8334444165229797. [I 2021-03-12 15:45:19,663] Trial 1 finished with value: 0.6240000128746033 and parameters: {'num_layers': 3, 'arch': 'D A B', 'neuron_l1': 32, 'neuron_l2': 128, 'neuron_l3': 1024}. Best is trial 0 with value: 0.8334444165229797. [I 2021-03-12 15:54:12,382] Trial 2 finished with value: 0.7188888788223267 and parameters: {'num_layers': 7, 'arch': 'A D B', 'neuron_l1': 1024, 'neuron_l2': 128, 'neuron_l3': 64, 'neuron_l4': 128, 'neuron_l5': 32, 'neuron_l6': 128, 'neuron_l7': 256}. Best is trial 0 with value: 0.8334444165229797. [I 2021-03-12 15:54:53,130] Trial 3 finished with value: 0.36622223258018494 and parameters: {'num_layers': 4, 'arch': 'D A B', 'neuron_l1': 10, 'neuron_l2': 128, 'neuron_l3': 1024, 'neuron_l4': 1024}. Best is trial 0 with value: 0.8334444165229797. [I 2021-03-12 15:57:41,103] Trial 5 finished with value: 0.8496666550636292 and parameters: {'num_layers': 5, 'arch': 'D B A', 'neuron_l1': 256, 'neuron_l2': 1024, 'neuron_l3': 256, 'neuron_l4': 512, 'neuron_l5': 256}. Best is trial 5 with value: 0.8496666550636292. [I 2021-03-12 16:00:58,521] Trial 9 finished with value: 0.777055561542511 and parameters: {'num_layers': 1, 'arch': 'B A D', 'neuron_l1': 64}. Best is trial 5 with value: 0.8496666550636292. [I 2021-03-12 16:03:28,629] Trial 6 finished with value: 0.7252777814865112 and parameters: {'num_layers': 5, 'arch': 'D A B', 'neuron_l1': 128, 'neuron_l2': 1024, 'neuron_l3': 32, 'neuron_l4': 512, 'neuron_l5': 128}. Best is trial 5 with value: 0.8496666550636292. [I 2021-03-12 16:04:04,377] Trial 8 finished with value: 0.7934444546699524 and parameters: {'num_layers': 4, 'arch': 'A B D', 'neuron_l1': 1024, 'neuron_l2': 32, 'neuron_l3': 256, 'neuron_l4': 256}. Best is trial 5 with value: 0.8496666550636292. [I 2021-03-12 16:14:22,054] Trial 7 finished with value: 0.8456110954284668 and parameters: {'num_layers': 7, 'arch': 'D B A', 'neuron_l1': 1024, 'neuron_l2': 256, 'neuron_l3': 64, 'neuron_l4': 10, 'neuron_l5': 512, 'neuron_l6': 512, 'neuron_l7': 1024}. Best is trial 5 with value: 0.8496666550636292.
arch_best_params = study_arch.best_params
display(Markdown("#### Least recorded Loss :"))
print(study_arch.best_value)
display(Markdown("#### Best set of parameters to determine the architecture :"))
print(arch_best_params)
hp_score_board = hp_score_board.append({'Phase':'Architecture Selection',
'Best Value':study_arch.best_value,
'Hyperparameters':f'{study_arch.best_params}'},ignore_index=True)
hp_score_board
| Phase | Best Value | Hyperparameters | |
|---|---|---|---|
| 0 | Architecture Selection | 0.849667 | {'num_layers': 5, 'arch': 'D B A', 'neuron_l1': 256, 'neuron_l2': 1024, 'neuron_l3': 256, 'neuron_l4': 512, 'neuron_l5': 256} |
Dropout, BatchNormalization and Activation in that order has yielded an accuracy of almost 85% which is an increase of almost 7%.optuna.visualization.plot_parallel_coordinate(study_arch)
optuna.visualization.plot_optimization_history(study_arch)
study_arch.trials_dataframe()
| number | value | datetime_start | datetime_complete | duration | params_arch | params_neuron_l1 | params_neuron_l2 | params_neuron_l3 | params_neuron_l4 | params_neuron_l5 | params_neuron_l6 | params_neuron_l7 | params_num_layers | state | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0.833444 | 2021-03-12 15:29:53.143478 | 2021-03-12 15:44:27.935319 | 0 days 00:14:34.791841 | B A D | 512 | 128.0 | NaN | NaN | NaN | NaN | NaN | 2 | COMPLETE |
| 1 | 1 | 0.624000 | 2021-03-12 15:29:53.144960 | 2021-03-12 15:45:19.662636 | 0 days 00:15:26.517676 | D A B | 32 | 128.0 | 1024.0 | NaN | NaN | NaN | NaN | 3 | COMPLETE |
| 2 | 2 | 0.718889 | 2021-03-12 15:29:53.148288 | 2021-03-12 15:54:12.382203 | 0 days 00:24:19.233915 | A D B | 1024 | 128.0 | 64.0 | 128.0 | 32.0 | 128.0 | 256.0 | 7 | COMPLETE |
| 3 | 3 | 0.366222 | 2021-03-12 15:29:53.149243 | 2021-03-12 15:54:53.129886 | 0 days 00:24:59.980643 | D A B | 10 | 128.0 | 1024.0 | 1024.0 | NaN | NaN | NaN | 4 | COMPLETE |
| 4 | 4 | 0.813389 | 2021-03-12 15:29:53.150771 | 2021-03-12 15:42:37.427418 | 0 days 00:12:44.276647 | B D A | 32 | 1024.0 | NaN | NaN | NaN | NaN | NaN | 2 | COMPLETE |
| 5 | 5 | 0.849667 | 2021-03-12 15:29:53.152356 | 2021-03-12 15:57:41.102577 | 0 days 00:27:47.950221 | D B A | 256 | 1024.0 | 256.0 | 512.0 | 256.0 | NaN | NaN | 5 | COMPLETE |
| 6 | 6 | 0.725278 | 2021-03-12 15:42:37.436857 | 2021-03-12 16:03:28.628168 | 0 days 00:20:51.191311 | D A B | 128 | 1024.0 | 32.0 | 512.0 | 128.0 | NaN | NaN | 5 | COMPLETE |
| 7 | 7 | 0.845611 | 2021-03-12 15:44:27.944194 | 2021-03-12 16:14:22.054167 | 0 days 00:29:54.109973 | D B A | 1024 | 256.0 | 64.0 | 10.0 | 512.0 | 512.0 | 1024.0 | 7 | COMPLETE |
| 8 | 8 | 0.793444 | 2021-03-12 15:45:19.668522 | 2021-03-12 16:04:04.376671 | 0 days 00:18:44.708149 | A B D | 1024 | 32.0 | 256.0 | 256.0 | NaN | NaN | NaN | 4 | COMPLETE |
| 9 | 9 | 0.777056 | 2021-03-12 15:54:12.388129 | 2021-03-12 16:00:58.521255 | 0 days 00:06:46.133126 | B A D | 64 | NaN | NaN | NaN | NaN | NaN | NaN | 1 | COMPLETE |
neurons_per_layer =[arch_best_params[f'neuron_l{i}'] for i in range(1,arch_best_params['num_layers']+1)]
arch_best_params['neurons_per_layer'] = neurons_per_layer
for i in range(1,arch_best_params['num_layers']+1):
if f'neuron_l{i}' in arch_best_params.keys():
del arch_best_params[f'neuron_l{i}']
else:
pass
display(Markdown("#### Modified best parameters from architecture :"))
print(arch_best_params)
{'num_layers': 5, 'arch': 'D B A', 'neurons_per_layer': [256, 1024, 256, 512, 256]}
def objective_coarse(trial):
global arch_best_params,loss,metric,Xc_train, yc_train, Xc_test, yc_test, Xc_val, yc_val
epochs = 100
lr = 0.001
batch_size = 200
drop_rate_per_layer = [trial.suggest_uniform(f'drop_l{layers}',0.1,0.90) for layers in range(1,arch_best_params['num_layers']+1)]
activation = trial.suggest_categorical('activation',['relu','sigmoid'])
kernel_init = trial.suggest_categorical('kernel_init',['he_normal','he_uniform','glorot_normal','glorot_uniform'])
opts = 'adam'
if opts == 'sgd':
optimizer = tf.keras.optimizers.SGD(learning_rate=lr)
elif opts == 'adadelta':
optimizer = tf.keras.optimizers.Adadelta(learning_rate=lr)
elif opts == 'adam':
optimizer = tf.keras.optimizers.Adam(learning_rate=lr)
elif opts == 'rmsprop':
optimizer = tf.keras.optimizers.RMSprop(learning_rate=lr)
kwargs = {
'batch_size': batch_size,
'drop_rate_per_layer' :drop_rate_per_layer,
'activation':activation,
'n_epochs': epochs,
'kernel_init': kernel_init,
'opts': optimizer,
}
kwargs = {**arch_best_params, **kwargs}
dnn_1 = DNN_Model('classification')
m_1 = dnn_1.create_model(Xc_train.shape[1],n_class,**kwargs)
callbacks=[]
dnn_1.train_model(m_1,Xc_train,yc_train,[loss],[metric],callbacks=[callbacks], validation_data = (Xc_val,yc_val))
result = dnn_1.test_model(Xc_test,yc_test)
return result[1]
study_coarse = optuna.create_study(direction='maximize',sampler=optuna.samplers.TPESampler())
study_coarse.optimize(objective_coarse,n_trials=12,n_jobs=-11)
[I 2021-03-12 16:19:22,566] A new study created in memory with name: no-name-866630d4-ddaa-42ce-af21-9b71c7c47cf6 [I 2021-03-12 17:01:30,661] Trial 3 finished with value: 0.7213333249092102 and parameters: {'drop_l1': 0.16594907964301875, 'drop_l2': 0.8248331074490886, 'drop_l3': 0.11088320482021877, 'drop_l4': 0.4994438248954397, 'drop_l5': 0.33026669773472267, 'activation': 'sigmoid', 'kernel_init': 'he_normal'}. Best is trial 3 with value: 0.7213333249092102. [I 2021-03-12 17:01:31,344] Trial 2 finished with value: 0.7850555777549744 and parameters: {'drop_l1': 0.4114771943105613, 'drop_l2': 0.7777199514832734, 'drop_l3': 0.2695192084752896, 'drop_l4': 0.8111658045120915, 'drop_l5': 0.5316333893252891, 'activation': 'relu', 'kernel_init': 'glorot_normal'}. Best is trial 2 with value: 0.7850555777549744. [I 2021-03-12 17:01:55,637] Trial 4 finished with value: 0.6452222466468811 and parameters: {'drop_l1': 0.3972659934350148, 'drop_l2': 0.6424725243815763, 'drop_l3': 0.8436577527534641, 'drop_l4': 0.7773138360870379, 'drop_l5': 0.7895287510027605, 'activation': 'relu', 'kernel_init': 'glorot_normal'}. Best is trial 2 with value: 0.7850555777549744. [I 2021-03-12 17:02:19,093] Trial 1 finished with value: 0.7504444718360901 and parameters: {'drop_l1': 0.5310665051699842, 'drop_l2': 0.6391546480592274, 'drop_l3': 0.6073738142346741, 'drop_l4': 0.5120799043215419, 'drop_l5': 0.8145615138368237, 'activation': 'relu', 'kernel_init': 'he_uniform'}. Best is trial 2 with value: 0.7850555777549744. [I 2021-03-12 17:02:19,318] Trial 5 finished with value: 0.7633333206176758 and parameters: {'drop_l1': 0.44190937818867615, 'drop_l2': 0.5490209861600642, 'drop_l3': 0.169513207009582, 'drop_l4': 0.5909458738586698, 'drop_l5': 0.26069697509270606, 'activation': 'sigmoid', 'kernel_init': 'he_normal'}. Best is trial 2 with value: 0.7850555777549744. [I 2021-03-12 17:02:19,325] Trial 0 finished with value: 0.8255000114440918 and parameters: {'drop_l1': 0.47857472943848944, 'drop_l2': 0.1821176166554544, 'drop_l3': 0.19155325775654097, 'drop_l4': 0.7636804068451283, 'drop_l5': 0.19803398595097718, 'activation': 'sigmoid', 'kernel_init': 'glorot_normal'}. Best is trial 0 with value: 0.8255000114440918. [I 2021-03-12 17:04:30,221] Trial 1 finished with value: 0.757777750492096 and parameters: {'drop_l1': 0.5625916753390173, 'drop_l2': 0.7273991182244415, 'drop_l3': 0.21068581103094744, 'drop_l4': 0.7688153766054248, 'drop_l5': 0.46710779404654, 'activation': 'relu', 'kernel_init': 'he_normal'}. Best is trial 1 with value: 0.757777750492096. [I 2021-03-12 17:04:32,537] Trial 5 finished with value: 0.8447777628898621 and parameters: {'drop_l1': 0.3610563746611468, 'drop_l2': 0.38441325380529534, 'drop_l3': 0.28631438832536016, 'drop_l4': 0.17173743874407166, 'drop_l5': 0.25520706236072704, 'activation': 'relu', 'kernel_init': 'glorot_uniform'}. Best is trial 5 with value: 0.8447777628898621. [I 2021-03-12 17:04:46,891] Trial 4 finished with value: 0.4949444532394409 and parameters: {'drop_l1': 0.824780491779742, 'drop_l2': 0.5944539214290161, 'drop_l3': 0.18041841623283422, 'drop_l4': 0.1937828953006541, 'drop_l5': 0.10109926094230347, 'activation': 'relu', 'kernel_init': 'he_normal'}. Best is trial 5 with value: 0.8447777628898621. [I 2021-03-12 17:04:47,239] Trial 0 finished with value: 0.8467222452163696 and parameters: {'drop_l1': 0.3448559151858719, 'drop_l2': 0.3831795967089904, 'drop_l3': 0.15696913898122977, 'drop_l4': 0.2602366164051947, 'drop_l5': 0.6395512387224598, 'activation': 'relu', 'kernel_init': 'glorot_uniform'}. Best is trial 0 with value: 0.8467222452163696. [I 2021-03-12 17:04:48,177] Trial 3 finished with value: 0.7353333234786987 and parameters: {'drop_l1': 0.1413845690005327, 'drop_l2': 0.8438886326723172, 'drop_l3': 0.5362995417256543, 'drop_l4': 0.4290965756316246, 'drop_l5': 0.8103923481593824, 'activation': 'relu', 'kernel_init': 'he_normal'}. Best is trial 0 with value: 0.8467222452163696. [I 2021-03-12 17:05:37,319] Trial 2 finished with value: 0.7600555419921875 and parameters: {'drop_l1': 0.6779483208264727, 'drop_l2': 0.24325007775929858, 'drop_l3': 0.13957645192027118, 'drop_l4': 0.6398074081838485, 'drop_l5': 0.6714064695078303, 'activation': 'sigmoid', 'kernel_init': 'he_normal'}. Best is trial 0 with value: 0.8467222452163696. [I 2021-03-12 17:32:38,904] Trial 6 finished with value: 0.8075000047683716 and parameters: {'drop_l1': 0.26026861342256324, 'drop_l2': 0.3556865530715745, 'drop_l3': 0.49535415145216954, 'drop_l4': 0.22166553171976908, 'drop_l5': 0.8763405061758626, 'activation': 'relu', 'kernel_init': 'glorot_uniform'}. Best is trial 0 with value: 0.8467222452163696. [I 2021-03-12 17:32:41,261] Trial 7 finished with value: 0.8116111159324646 and parameters: {'drop_l1': 0.6063023060215158, 'drop_l2': 0.1377128394461539, 'drop_l3': 0.48072242292680956, 'drop_l4': 0.47795712975326043, 'drop_l5': 0.16080389130044798, 'activation': 'sigmoid', 'kernel_init': 'glorot_normal'}. Best is trial 0 with value: 0.8467222452163696. [I 2021-03-12 17:32:51,165] Trial 9 finished with value: 0.7256110906600952 and parameters: {'drop_l1': 0.6822697216191934, 'drop_l2': 0.26532670317192447, 'drop_l3': 0.45489812223888704, 'drop_l4': 0.5979272460127666, 'drop_l5': 0.6494832606740021, 'activation': 'sigmoid', 'kernel_init': 'he_uniform'}. Best is trial 0 with value: 0.8467222452163696. [I 2021-03-12 17:32:52,375] Trial 10 finished with value: 0.7862777709960938 and parameters: {'drop_l1': 0.2739707893096928, 'drop_l2': 0.6462500757511781, 'drop_l3': 0.8185139760193942, 'drop_l4': 0.6321142183939601, 'drop_l5': 0.682198720566156, 'activation': 'relu', 'kernel_init': 'he_uniform'}. Best is trial 0 with value: 0.8467222452163696. [I 2021-03-12 17:32:52,447] Trial 8 finished with value: 0.3674444556236267 and parameters: {'drop_l1': 0.5516924810815531, 'drop_l2': 0.5381577108968797, 'drop_l3': 0.8794619685071172, 'drop_l4': 0.17766713653333988, 'drop_l5': 0.8578321607612219, 'activation': 'relu', 'kernel_init': 'he_normal'}. Best is trial 0 with value: 0.8467222452163696. [I 2021-03-12 17:33:09,501] Trial 11 finished with value: 0.4888888895511627 and parameters: {'drop_l1': 0.6484694757055458, 'drop_l2': 0.21938469025278923, 'drop_l3': 0.2698781219282369, 'drop_l4': 0.3543055514394645, 'drop_l5': 0.8981381923143936, 'activation': 'relu', 'kernel_init': 'he_normal'}. Best is trial 0 with value: 0.8467222452163696.
study_coarse.best_value, study_coarse.best_params
(0.8467222452163696,
{'drop_l1': 0.3448559151858719,
'drop_l2': 0.3831795967089904,
'drop_l3': 0.15696913898122977,
'drop_l4': 0.2602366164051947,
'drop_l5': 0.6395512387224598,
'activation': 'relu',
'kernel_init': 'glorot_uniform'})
hp_score_board = hp_score_board.append({'Phase':'Coarse Tuning',
'Best Value':study_coarse.best_value,
'Hyperparameters':f'{study_coarse.best_params}'},ignore_index=True)
hp_score_board
| Phase | Best Value | Hyperparameters | |
|---|---|---|---|
| 0 | Architecture Selection | 0.849667 | {'num_layers': 5, 'arch': 'D B A', 'neuron_l1': 256, 'neuron_l2': 1024, 'neuron_l3': 256, 'neuron_l4': 512, 'neuron_l5': 256} |
| 1 | Coarse Tuning | 0.846722 | {'drop_l1': 0.3448559151858719, 'drop_l2': 0.3831795967089904, 'drop_l3': 0.15696913898122977, 'drop_l4': 0.2602366164051947, 'drop_l5': 0.6395512387224598, 'activation': 'relu', 'kernel_init': 'glorot_uniform'} |
0.3%.study_coarse.trials_dataframe()
| number | value | datetime_start | datetime_complete | duration | params_activation | params_drop_l1 | params_drop_l2 | params_drop_l3 | params_drop_l4 | params_drop_l5 | params_kernel_init | state | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0.846722 | 2021-03-12 16:19:22.588286 | 2021-03-12 17:04:47.238640 | 0 days 00:45:24.650354 | relu | 0.344856 | 0.383180 | 0.156969 | 0.260237 | 0.639551 | glorot_uniform | COMPLETE |
| 1 | 1 | 0.757778 | 2021-03-12 16:19:22.592620 | 2021-03-12 17:04:30.220932 | 0 days 00:45:07.628312 | relu | 0.562592 | 0.727399 | 0.210686 | 0.768815 | 0.467108 | he_normal | COMPLETE |
| 2 | 2 | 0.760056 | 2021-03-12 16:19:22.603590 | 2021-03-12 17:05:37.318691 | 0 days 00:46:14.715101 | sigmoid | 0.677948 | 0.243250 | 0.139576 | 0.639807 | 0.671406 | he_normal | COMPLETE |
| 3 | 3 | 0.735333 | 2021-03-12 16:19:22.604959 | 2021-03-12 17:04:48.176686 | 0 days 00:45:25.571727 | relu | 0.141385 | 0.843889 | 0.536300 | 0.429097 | 0.810392 | he_normal | COMPLETE |
| 4 | 4 | 0.494944 | 2021-03-12 16:19:22.607689 | 2021-03-12 17:04:46.890767 | 0 days 00:45:24.283078 | relu | 0.824780 | 0.594454 | 0.180418 | 0.193783 | 0.101099 | he_normal | COMPLETE |
| 5 | 5 | 0.844778 | 2021-03-12 16:19:22.609011 | 2021-03-12 17:04:32.537117 | 0 days 00:45:09.928106 | relu | 0.361056 | 0.384413 | 0.286314 | 0.171737 | 0.255207 | glorot_uniform | COMPLETE |
| 6 | 6 | 0.807500 | 2021-03-12 17:04:30.229417 | 2021-03-12 17:32:38.904035 | 0 days 00:28:08.674618 | relu | 0.260269 | 0.355687 | 0.495354 | 0.221666 | 0.876341 | glorot_uniform | COMPLETE |
| 7 | 7 | 0.811611 | 2021-03-12 17:04:32.547443 | 2021-03-12 17:32:41.260201 | 0 days 00:28:08.712758 | sigmoid | 0.606302 | 0.137713 | 0.480722 | 0.477957 | 0.160804 | glorot_normal | COMPLETE |
| 8 | 8 | 0.367444 | 2021-03-12 17:04:46.895408 | 2021-03-12 17:32:52.447206 | 0 days 00:28:05.551798 | relu | 0.551692 | 0.538158 | 0.879462 | 0.177667 | 0.857832 | he_normal | COMPLETE |
| 9 | 9 | 0.725611 | 2021-03-12 17:04:47.243128 | 2021-03-12 17:32:51.164889 | 0 days 00:28:03.921761 | sigmoid | 0.682270 | 0.265327 | 0.454898 | 0.597927 | 0.649483 | he_uniform | COMPLETE |
| 10 | 10 | 0.786278 | 2021-03-12 17:04:48.185468 | 2021-03-12 17:32:52.374428 | 0 days 00:28:04.188960 | relu | 0.273971 | 0.646250 | 0.818514 | 0.632114 | 0.682199 | he_uniform | COMPLETE |
| 11 | 11 | 0.488889 | 2021-03-12 17:05:37.343822 | 2021-03-12 17:33:09.500785 | 0 days 00:27:32.156963 | relu | 0.648469 | 0.219385 | 0.269878 | 0.354306 | 0.898138 | he_normal | COMPLETE |
optuna.visualization.plot_parallel_coordinate(study_coarse)
optuna.visualization.plot_optimization_history(study_coarse)
optuna.visualization.plot_slice(study_coarse)
coarse_best_params = study_coarse.best_params
coarse_best_params, study_coarse.best_value
({'drop_l1': 0.3448559151858719,
'drop_l2': 0.3831795967089904,
'drop_l3': 0.15696913898122977,
'drop_l4': 0.2602366164051947,
'drop_l5': 0.6395512387224598,
'activation': 'relu',
'kernel_init': 'glorot_uniform'},
0.8467222452163696)
drop_rate_per_layer =[coarse_best_params[f'drop_l{i}'] for i in range(1,arch_best_params['num_layers']+1)]
drop_rate_per_layer
coarse_best_params['drop_rate_per_layer'] = drop_rate_per_layer
for i in range(1,arch_best_params['num_layers']+1):
if f'drop_l{i}' in coarse_best_params.keys():
del coarse_best_params[f'drop_l{i}']
else:
pass
display(Markdown("#### Modified best parameters after Coarse Tuning :"))
print(coarse_best_params)
hp_best_params = {**arch_best_params , **coarse_best_params}
display(Markdown("#### Consolidate best parameters so far :"))
print(hp_best_params)
{'activation': 'relu', 'kernel_init': 'glorot_uniform', 'drop_rate_per_layer': [0.3448559151858719, 0.3831795967089904, 0.15696913898122977, 0.2602366164051947, 0.6395512387224598]}
{'num_layers': 5, 'arch': 'D B A', 'neurons_per_layer': [256, 1024, 256, 512, 256], 'activation': 'relu', 'kernel_init': 'glorot_uniform', 'drop_rate_per_layer': [0.3448559151858719, 0.3831795967089904, 0.15696913898122977, 0.2602366164051947, 0.6395512387224598]}
def create_optimizer(trial):
# We optimize the choice of optimizers as well as their parameters.
kwargs = {}
optimizer_options = ["RMSprop", "Adam", "SGD", "Adadelta"]
optimizer_selected = trial.suggest_categorical("optimizer", optimizer_options)
if optimizer_selected == "RMSprop":
kwargs["learning_rate"] = trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True)
kwargs["rho"] = trial.suggest_float("rho", 0.85, 0.99)
kwargs["momentum"] = trial.suggest_float("momentum", 1e-5, 1e-1, log=True)
kwargs['epsilon'] = trial.suggest_float("epsilon",1e-9, 1e-2, log=True)
elif optimizer_selected == "Adam":
kwargs["learning_rate"] = trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True)
kwargs['epsilon'] = trial.suggest_float("epsilon",1e-9, 1e-2, log=True)
kwargs['beta_1'] = trial.suggest_float("beta_1", 0.85, 0.99)
kwargs['beta_2'] = trial.suggest_float("beta_2", 0.85, 0.99)
elif optimizer_selected == "SGD":
kwargs["learning_rate"] = trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True)
kwargs["momentum"] = trial.suggest_float("momentum", 1e-5, 1e-1, log=True)
kwargs["nesterov"] = trial.suggest_categorical("nesterov",[True,False])
elif optimizer_selected == 'Adadelta':
kwargs["learning_rate"] = trial.suggest_float("learning_rate", 1e-5, 1e-1, log=True)
kwargs["rho"] = trial.suggest_float("rho", 0.85,0.99)
optimizer = getattr(tf.optimizers, optimizer_selected)(**kwargs)
return optimizer
def objective_fine(trial):
global hp_best_params,loss,metric,Xc_train, yc_train, Xc_test, yc_test, Xc_val, yc_val
epochs = 100
batch_size = 200
optimizer = create_optimizer(trial)
kwargs = {'opts': optimizer,
'n_epochs' : epochs,
'batch_size' : batch_size
}
kwargs = {**hp_best_params, **kwargs}
dnn_1 = DNN_Model('classification')
m_1 = dnn_1.create_model(Xc_train.shape[1],n_class,**kwargs)
callbacks=[]
dnn_1.train_model(m_1,Xc_train,yc_train,[loss],[metric],callbacks=[callbacks], validation_data = (Xc_val,yc_val))
result = dnn_1.test_model(Xc_test,yc_test)
return result[1]
study_fine = optuna.create_study(direction='maximize',sampler=optuna.samplers.TPESampler())
study_fine.optimize(objective_fine,n_trials=12,n_jobs=-11)
[I 2021-03-12 17:34:27,439] A new study created in memory with name: no-name-41ec4730-0cf2-445a-b0f1-5092df73c105 [I 2021-03-12 18:02:18,218] Trial 5 finished with value: 0.8162222504615784 and parameters: {'optimizer': 'Adam', 'learning_rate': 1.245217751367679e-05, 'epsilon': 3.642450587270571e-05, 'beta_1': 0.9157611913958704, 'beta_2': 0.8662032664944924}. Best is trial 5 with value: 0.8162222504615784. [I 2021-03-12 18:02:19,901] Trial 1 finished with value: 0.8345000147819519 and parameters: {'optimizer': 'SGD', 'learning_rate': 0.008574446705878685, 'momentum': 0.004173203783790507, 'nesterov': True}. Best is trial 1 with value: 0.8345000147819519. [I 2021-03-12 18:02:22,266] Trial 2 finished with value: 0.7906666398048401 and parameters: {'optimizer': 'SGD', 'learning_rate': 0.000839644457847426, 'momentum': 5.111918305368254e-05, 'nesterov': True}. Best is trial 1 with value: 0.8345000147819519. [I 2021-03-12 18:03:01,547] Trial 4 finished with value: 0.7747222185134888 and parameters: {'optimizer': 'RMSprop', 'learning_rate': 0.0012178622575253967, 'rho': 0.9418312575058575, 'momentum': 4.967591650141032e-05, 'epsilon': 1.1896751497394832e-08}. Best is trial 1 with value: 0.8345000147819519. [I 2021-03-12 18:03:08,449] Trial 3 finished with value: 0.8306666612625122 and parameters: {'optimizer': 'RMSprop', 'learning_rate': 0.00012718619842263585, 'rho': 0.887716844443124, 'momentum': 0.00011071008810785081, 'epsilon': 0.0018760304190320583}. Best is trial 1 with value: 0.8345000147819519. [I 2021-03-12 18:03:12,693] Trial 0 finished with value: 0.1176111102104187 and parameters: {'optimizer': 'Adadelta', 'learning_rate': 1.5320667688319526e-05, 'rho': 0.8543792591685918}. Best is trial 1 with value: 0.8345000147819519. [I 2021-03-12 18:30:22,878] Trial 6 finished with value: 0.47761112451553345 and parameters: {'optimizer': 'Adam', 'learning_rate': 0.0031454181852359773, 'epsilon': 1.9249087627039325e-06, 'beta_1': 0.8623736110437803, 'beta_2': 0.8672057650856403}. Best is trial 1 with value: 0.8345000147819519. [I 2021-03-12 18:31:10,001] Trial 7 finished with value: 0.8399444222450256 and parameters: {'optimizer': 'RMSprop', 'learning_rate': 0.00025171035706235336, 'rho': 0.9649465556556495, 'momentum': 0.0001073860430937625, 'epsilon': 1.3506596973662052e-08}. Best is trial 7 with value: 0.8399444222450256. [I 2021-03-12 18:31:12,949] Trial 8 finished with value: 0.8405555486679077 and parameters: {'optimizer': 'RMSprop', 'learning_rate': 9.348349475421753e-05, 'rho': 0.9796096485818788, 'momentum': 2.027902369214106e-05, 'epsilon': 3.749549720836366e-05}. Best is trial 8 with value: 0.8405555486679077. [I 2021-03-12 18:31:39,701] Trial 9 finished with value: 0.7586110830307007 and parameters: {'optimizer': 'RMSprop', 'learning_rate': 0.02149529566029854, 'rho': 0.9151184390332241, 'momentum': 0.0020034880231672844, 'epsilon': 7.021589256062627e-05}. Best is trial 8 with value: 0.8405555486679077. [I 2021-03-12 18:31:47,477] Trial 10 finished with value: 0.8180000185966492 and parameters: {'optimizer': 'Adadelta', 'learning_rate': 0.02568705599439039, 'rho': 0.850553864656622}. Best is trial 8 with value: 0.8405555486679077. [I 2021-03-12 18:31:54,224] Trial 11 finished with value: 0.10122222453355789 and parameters: {'optimizer': 'Adadelta', 'learning_rate': 4.4973860366001693e-05, 'rho': 0.9898155111522466}. Best is trial 8 with value: 0.8405555486679077.
fine_best_params,opt_conf = study_fine.best_params,study_fine.best_params
fine_best_params, study_fine.best_value
({'optimizer': 'RMSprop',
'learning_rate': 9.348349475421753e-05,
'rho': 0.9796096485818788,
'momentum': 2.027902369214106e-05,
'epsilon': 3.749549720836366e-05},
0.8405555486679077)
hp_score_board = hp_score_board.append({'Phase':'Fine Tuning - Optimizer',
'Best Value':study_fine.best_value,
'Hyperparameters':f'{study_fine.best_params}'},ignore_index=True)
hp_score_board
| Phase | Best Value | Hyperparameters | |
|---|---|---|---|
| 0 | Architecture Selection | 0.849667 | {'num_layers': 5, 'arch': 'D B A', 'neuron_l1': 256, 'neuron_l2': 1024, 'neuron_l3': 256, 'neuron_l4': 512, 'neuron_l5': 256} |
| 1 | Coarse Tuning | 0.846722 | {'drop_l1': 0.3448559151858719, 'drop_l2': 0.3831795967089904, 'drop_l3': 0.15696913898122977, 'drop_l4': 0.2602366164051947, 'drop_l5': 0.6395512387224598, 'activation': 'relu', 'kernel_init': 'glorot_uniform'} |
| 2 | Fine Tuning - Optimizer | 0.840556 | {'optimizer': 'RMSprop', 'learning_rate': 9.348349475421753e-05, 'rho': 0.9796096485818788, 'momentum': 2.027902369214106e-05, 'epsilon': 3.749549720836366e-05} |
## building the optimizer from the fine tuning.
sel_opti = fine_best_params['optimizer']
fine_best_params.pop('optimizer', None)
fine_best_params
optimizer = getattr(tf.optimizers, sel_opti)(**fine_best_params)
hp_best_params['opts'] = optimizer
hp_best_params
{'num_layers': 5,
'arch': 'D B A',
'neurons_per_layer': [256, 1024, 256, 512, 256],
'activation': 'relu',
'kernel_init': 'glorot_uniform',
'drop_rate_per_layer': [0.3448559151858719,
0.3831795967089904,
0.15696913898122977,
0.2602366164051947,
0.6395512387224598],
'opts': <tensorflow.python.keras.optimizer_v2.rmsprop.RMSprop at 0x7f080759e190>}
study_fine.trials_dataframe()
| number | value | datetime_start | datetime_complete | duration | params_beta_1 | params_beta_2 | params_epsilon | params_learning_rate | params_momentum | params_nesterov | params_optimizer | params_rho | state | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0.117611 | 2021-03-12 17:34:27.444557 | 2021-03-12 18:03:12.692971 | 0 days 00:28:45.248414 | NaN | NaN | NaN | 0.000015 | NaN | NaN | Adadelta | 0.854379 | COMPLETE |
| 1 | 1 | 0.834500 | 2021-03-12 17:34:27.447824 | 2021-03-12 18:02:19.901164 | 0 days 00:27:52.453340 | NaN | NaN | NaN | 0.008574 | 0.004173 | True | SGD | NaN | COMPLETE |
| 2 | 2 | 0.790667 | 2021-03-12 17:34:27.449379 | 2021-03-12 18:02:22.265685 | 0 days 00:27:54.816306 | NaN | NaN | NaN | 0.000840 | 0.000051 | True | SGD | NaN | COMPLETE |
| 3 | 3 | 0.830667 | 2021-03-12 17:34:27.451238 | 2021-03-12 18:03:08.449414 | 0 days 00:28:40.998176 | NaN | NaN | 1.876030e-03 | 0.000127 | 0.000111 | NaN | RMSprop | 0.887717 | COMPLETE |
| 4 | 4 | 0.774722 | 2021-03-12 17:34:27.453104 | 2021-03-12 18:03:01.546435 | 0 days 00:28:34.093331 | NaN | NaN | 1.189675e-08 | 0.001218 | 0.000050 | NaN | RMSprop | 0.941831 | COMPLETE |
| 5 | 5 | 0.816222 | 2021-03-12 17:34:27.453938 | 2021-03-12 18:02:18.217846 | 0 days 00:27:50.763908 | 0.915761 | 0.866203 | 3.642451e-05 | 0.000012 | NaN | NaN | Adam | NaN | COMPLETE |
| 6 | 6 | 0.477611 | 2021-03-12 18:02:18.228254 | 2021-03-12 18:30:22.878297 | 0 days 00:28:04.650043 | 0.862374 | 0.867206 | 1.924909e-06 | 0.003145 | NaN | NaN | Adam | NaN | COMPLETE |
| 7 | 7 | 0.839944 | 2021-03-12 18:02:19.906765 | 2021-03-12 18:31:10.000385 | 0 days 00:28:50.093620 | NaN | NaN | 1.350660e-08 | 0.000252 | 0.000107 | NaN | RMSprop | 0.964947 | COMPLETE |
| 8 | 8 | 0.840556 | 2021-03-12 18:02:22.277772 | 2021-03-12 18:31:12.949144 | 0 days 00:28:50.671372 | NaN | NaN | 3.749550e-05 | 0.000093 | 0.000020 | NaN | RMSprop | 0.979610 | COMPLETE |
| 9 | 9 | 0.758611 | 2021-03-12 18:03:01.553807 | 2021-03-12 18:31:39.700553 | 0 days 00:28:38.146746 | NaN | NaN | 7.021589e-05 | 0.021495 | 0.002003 | NaN | RMSprop | 0.915118 | COMPLETE |
| 10 | 10 | 0.818000 | 2021-03-12 18:03:08.455977 | 2021-03-12 18:31:47.477029 | 0 days 00:28:39.021052 | NaN | NaN | NaN | 0.025687 | NaN | NaN | Adadelta | 0.850554 | COMPLETE |
| 11 | 11 | 0.101222 | 2021-03-12 18:03:12.699554 | 2021-03-12 18:31:54.224157 | 0 days 00:28:41.524603 | NaN | NaN | NaN | 0.000045 | NaN | NaN | Adadelta | 0.989816 | COMPLETE |
optuna.visualization.plot_parallel_coordinate(study_fine)
optuna.visualization.plot_optimization_history(study_fine)
optuna.visualization.plot_slice(study_fine)
hp_best_params
{'num_layers': 5,
'arch': 'D B A',
'neurons_per_layer': [256, 1024, 256, 512, 256],
'activation': 'relu',
'kernel_init': 'glorot_uniform',
'drop_rate_per_layer': [0.3448559151858719,
0.3831795967089904,
0.15696913898122977,
0.2602366164051947,
0.6395512387224598],
'opts': <tensorflow.python.keras.optimizer_v2.rmsprop.RMSprop at 0x7f080759e190>}
def objective_final(trial):
global hp_best_params,loss,metric,Xc_train, yc_train, Xc_test, yc_test, Xc_val, yc_val
# momentum = trial.suggest_uniform('momentum',0.9,0.99)
# epsilon = trial.suggest_loguniform('epsilon',0.0005,0.05)
# beta_init_std = trial.suggest_categorical('beta_init_std',[1.0,0.5,0.05,0.005,0.005])
# gamma_init = trial.suggest_categorical('gamma_init',[0.8,0.9,1.0])
# center = trial.suggest_categorical('center',[True,False])
# scale = trial.suggest_categorical('scale',[True,False])
batch_size = trial.suggest_int('batch_size',100,1000,step=100)
n_epochs = trial.suggest_categorical('n_epochs',[50,100,200,500,1000])
kwargs = {
# 'momentum': momentum,
# 'epsilon' :epsilon,
# 'beta_init_std':beta_init_std,
# 'gamma_init': gamma_init,
# 'center': center,
# 'scale': scale,
'batch_size' : batch_size,
'n_epochs' : n_epochs
}
kwargs = {**hp_best_params, **kwargs}
dnn_1 = DNN_Model('classification')
m_1 = dnn_1.create_model(Xc_train.shape[1],n_class,**kwargs)
callbacks=[]
dnn_1.train_model(m_1,Xc_train,yc_train,[loss],[metric],callbacks=[callbacks], validation_data = (Xc_val,yc_val))
result = dnn_1.test_model(Xc_test,yc_test)
return result[1]
study_final = optuna.create_study(direction='maximize',sampler=optuna.samplers.TPESampler())
study_final.optimize(objective_final,n_trials=12,n_jobs=-11)
[I 2021-03-12 19:10:44,273] A new study created in memory with name: no-name-13945a2f-a30a-4d54-85d0-cb313b987f70 [I 2021-03-12 19:25:29,589] Trial 4 finished with value: 0.8134444355964661 and parameters: {'batch_size': 200, 'n_epochs': 50}. Best is trial 4 with value: 0.8134444355964661. [I 2021-03-12 19:39:46,345] Trial 3 finished with value: 0.8412222266197205 and parameters: {'batch_size': 400, 'n_epochs': 100}. Best is trial 3 with value: 0.8412222266197205. [I 2021-03-12 19:39:47,115] Trial 2 finished with value: 0.8347222208976746 and parameters: {'batch_size': 900, 'n_epochs': 100}. Best is trial 3 with value: 0.8412222266197205. [I 2021-03-12 19:39:47,904] Trial 1 finished with value: 0.8351666927337646 and parameters: {'batch_size': 100, 'n_epochs': 100}. Best is trial 3 with value: 0.8412222266197205. [I 2021-03-12 19:39:52,622] Trial 0 finished with value: 0.8466110825538635 and parameters: {'batch_size': 800, 'n_epochs': 100}. Best is trial 0 with value: 0.8466110825538635. [I 2021-03-12 20:09:00,414] Trial 8 finished with value: 0.8300555348396301 and parameters: {'batch_size': 600, 'n_epochs': 100}. Best is trial 0 with value: 0.8466110825538635. [I 2021-03-12 20:09:00,634] Trial 9 finished with value: 0.8443333506584167 and parameters: {'batch_size': 300, 'n_epochs': 100}. Best is trial 0 with value: 0.8466110825538635. [I 2021-03-12 20:21:40,032] Trial 6 finished with value: 0.8581666946411133 and parameters: {'batch_size': 700, 'n_epochs': 200}. Best is trial 6 with value: 0.8581666946411133. [I 2021-03-12 21:43:52,677] Trial 10 finished with value: 0.879444420337677 and parameters: {'batch_size': 600, 'n_epochs': 500}. Best is trial 10 with value: 0.879444420337677. [I 2021-03-12 21:44:16,148] Trial 7 finished with value: 0.8813889026641846 and parameters: {'batch_size': 400, 'n_epochs': 500}. Best is trial 7 with value: 0.8813889026641846. [I 2021-03-12 23:02:53,056] Trial 5 finished with value: 0.8906111121177673 and parameters: {'batch_size': 500, 'n_epochs': 1000}. Best is trial 5 with value: 0.8906111121177673. [I 2021-03-12 23:41:40,197] Trial 11 finished with value: 0.8866666555404663 and parameters: {'batch_size': 100, 'n_epochs': 1000}. Best is trial 5 with value: 0.8906111121177673.
final_best_params = study_final.best_params
final_best_params, study_final.best_value
({'batch_size': 500, 'n_epochs': 1000}, 0.8906111121177673)
hp_score_board = hp_score_board.append({'Phase':'Final Tuning - Epcoh/Batch Size',
'Best Value':study_final.best_value,
'Hyperparameters':f'{study_final.best_params}'},ignore_index=True)
hp_score_board
| Phase | Best Value | Hyperparameters | |
|---|---|---|---|
| 0 | Architecture Selection | 0.849667 | {'num_layers': 5, 'arch': 'D B A', 'neuron_l1': 256, 'neuron_l2': 1024, 'neuron_l3': 256, 'neuron_l4': 512, 'neuron_l5': 256} |
| 1 | Coarse Tuning | 0.846722 | {'drop_l1': 0.3448559151858719, 'drop_l2': 0.3831795967089904, 'drop_l3': 0.15696913898122977, 'drop_l4': 0.2602366164051947, 'drop_l5': 0.6395512387224598, 'activation': 'relu', 'kernel_init': 'glorot_uniform'} |
| 2 | Fine Tuning - Optimizer | 0.840556 | {'optimizer': 'RMSprop', 'learning_rate': 9.348349475421753e-05, 'rho': 0.9796096485818788, 'momentum': 2.027902369214106e-05, 'epsilon': 3.749549720836366e-05} |
| 3 | Final Tuning - Epcoh/Batch Size | 0.890611 | {'batch_size': 500, 'n_epochs': 1000} |
study_final.trials_dataframe()
| number | value | datetime_start | datetime_complete | duration | params_batch_size | params_n_epochs | state | |
|---|---|---|---|---|---|---|---|---|
| 0 | 0 | 0.846611 | 2021-03-12 19:10:44.278569 | 2021-03-12 19:39:52.621597 | 0 days 00:29:08.343028 | 800 | 100 | COMPLETE |
| 1 | 1 | 0.835167 | 2021-03-12 19:10:44.279966 | 2021-03-12 19:39:47.904338 | 0 days 00:29:03.624372 | 100 | 100 | COMPLETE |
| 2 | 2 | 0.834722 | 2021-03-12 19:10:44.280952 | 2021-03-12 19:39:47.115452 | 0 days 00:29:02.834500 | 900 | 100 | COMPLETE |
| 3 | 3 | 0.841222 | 2021-03-12 19:10:44.281781 | 2021-03-12 19:39:46.345027 | 0 days 00:29:02.063246 | 400 | 100 | COMPLETE |
| 4 | 4 | 0.813444 | 2021-03-12 19:10:44.282877 | 2021-03-12 19:25:29.588651 | 0 days 00:14:45.305774 | 200 | 50 | COMPLETE |
| 5 | 5 | 0.890611 | 2021-03-12 19:10:44.285854 | 2021-03-12 23:02:53.055799 | 0 days 03:52:08.769945 | 500 | 1000 | COMPLETE |
| 6 | 6 | 0.858167 | 2021-03-12 19:25:29.595424 | 2021-03-12 20:21:40.031584 | 0 days 00:56:10.436160 | 700 | 200 | COMPLETE |
| 7 | 7 | 0.881389 | 2021-03-12 19:39:46.351985 | 2021-03-12 21:44:16.147565 | 0 days 02:04:29.795580 | 400 | 500 | COMPLETE |
| 8 | 8 | 0.830056 | 2021-03-12 19:39:47.125013 | 2021-03-12 20:09:00.414208 | 0 days 00:29:13.289195 | 600 | 100 | COMPLETE |
| 9 | 9 | 0.844333 | 2021-03-12 19:39:47.915539 | 2021-03-12 20:09:00.634270 | 0 days 00:29:12.718731 | 300 | 100 | COMPLETE |
| 10 | 10 | 0.879444 | 2021-03-12 19:39:52.629718 | 2021-03-12 21:43:52.676859 | 0 days 02:04:00.047141 | 600 | 500 | COMPLETE |
| 11 | 11 | 0.886667 | 2021-03-12 20:09:00.422519 | 2021-03-12 23:41:40.197642 | 0 days 03:32:39.775123 | 100 | 1000 | COMPLETE |
optuna.visualization.plot_parallel_coordinate(study_final)
optuna.visualization.plot_optimization_history(study_final)
optuna.visualization.plot_slice(study_final)
model_best_params = { **hp_best_params, **final_best_params}
model_best_params
{'num_layers': 5,
'arch': 'D B A',
'neurons_per_layer': [256, 1024, 256, 512, 256],
'activation': 'relu',
'kernel_init': 'glorot_uniform',
'drop_rate_per_layer': [0.3448559151858719,
0.3831795967089904,
0.15696913898122977,
0.2602366164051947,
0.6395512387224598],
'opts': <tensorflow.python.keras.optimizer_v2.rmsprop.RMSprop at 0x7f080759e190>,
'batch_size': 500,
'n_epochs': 1000}
# model_best_params['n_epochs'] = 10
# model_best_params['batch_size'] = 100
# model_best_params
dnn_final = DNN_Model('classification')
m_final = dnn_final.create_model(Xc_train.shape[1],n_class,**model_best_params)
callbacks = LivePlot(1,train_loss='loss',train_metric='accuracy')
# callbacks=[]
dnn_final.train_model(m_final,Xc_train,yc_train,[loss],[metric],validation_data = (Xc_val,yc_val),callbacks=[callbacks])
result_final = dnn_final.test_model(Xc_test,yc_test)
display(Markdown("### Model Summary"))
print(m_final.summary())
display(Markdown(f"### Test Loss : {result_final[0]} Test Accuracy : {result_final[1]}"))
Model: "model_23" _________________________________________________________________ Layer (type) Output Shape Param # ================================================================= input_24 (InputLayer) [(None, 484)] 0 _________________________________________________________________ dense_93 (Dense) (None, 256) 124160 _________________________________________________________________ dropout_70 (Dropout) (None, 256) 0 _________________________________________________________________ batch_normalization_70 (Batc (None, 256) 768 _________________________________________________________________ activation_70 (Activation) (None, 256) 0 _________________________________________________________________ dense_94 (Dense) (None, 1024) 263168 _________________________________________________________________ dropout_71 (Dropout) (None, 1024) 0 _________________________________________________________________ batch_normalization_71 (Batc (None, 1024) 3072 _________________________________________________________________ activation_71 (Activation) (None, 1024) 0 _________________________________________________________________ dense_95 (Dense) (None, 256) 262400 _________________________________________________________________ dropout_72 (Dropout) (None, 256) 0 _________________________________________________________________ batch_normalization_72 (Batc (None, 256) 768 _________________________________________________________________ activation_72 (Activation) (None, 256) 0 _________________________________________________________________ dense_96 (Dense) (None, 512) 131584 _________________________________________________________________ dropout_73 (Dropout) (None, 512) 0 _________________________________________________________________ batch_normalization_73 (Batc (None, 512) 1536 _________________________________________________________________ activation_73 (Activation) (None, 512) 0 _________________________________________________________________ dense_97 (Dense) (None, 256) 131328 _________________________________________________________________ dropout_74 (Dropout) (None, 256) 0 _________________________________________________________________ batch_normalization_74 (Batc (None, 256) 768 _________________________________________________________________ activation_74 (Activation) (None, 256) 0 _________________________________________________________________ dense_98 (Dense) (None, 10) 2570 ================================================================= Total params: 922,122 Trainable params: 917,514 Non-trainable params: 4,608 _________________________________________________________________ None
hp_score_board = hp_score_board.append({'Phase':'Model Testing - Train/Test',
'Best Value':f'{loss}: {result_final[0]} - {metric}: {result_final[1]}',
'Hyperparameters':f'{model_best_params}'},ignore_index=True)
hp_score_board
| Phase | Best Value | Hyperparameters | |
|---|---|---|---|
| 0 | Architecture Selection | 0.849667 | {'num_layers': 5, 'arch': 'D B A', 'neuron_l1': 256, 'neuron_l2': 1024, 'neuron_l3': 256, 'neuron_l4': 512, 'neuron_l5': 256} |
| 1 | Coarse Tuning | 0.846722 | {'drop_l1': 0.3448559151858719, 'drop_l2': 0.3831795967089904, 'drop_l3': 0.15696913898122977, 'drop_l4': 0.2602366164051947, 'drop_l5': 0.6395512387224598, 'activation': 'relu', 'kernel_init': 'glorot_uniform'} |
| 2 | Fine Tuning - Optimizer | 0.840556 | {'optimizer': 'RMSprop', 'learning_rate': 9.348349475421753e-05, 'rho': 0.9796096485818788, 'momentum': 2.027902369214106e-05, 'epsilon': 3.749549720836366e-05} |
| 3 | Final Tuning - Epcoh/Batch Size | 0.890611 | {'batch_size': 500, 'n_epochs': 1000} |
| 4 | Model Testing - Train/Test | categorical_crossentropy: 0.6104946136474609 - accuracy: 0.8848333358764648 | {'num_layers': 5, 'arch': 'D B A', 'neurons_per_layer': [256, 1024, 256, 512, 256], 'activation': 'relu', 'kernel_init': 'glorot_uniform', 'drop_rate_per_layer': [0.3448559151858719, 0.3831795967089904, 0.15696913898122977, 0.2602366164051947, 0.6395512387224598], 'opts': <tensorflow.python.keras.optimizer_v2.rmsprop.RMSprop object at 0x7f080759e190>, 'batch_size': 500, 'n_epochs': 1000} |
# summarize history for accuracy
plt.figure(figsize = (15,8))
plt.plot(dnn_final.history.history['accuracy'])
plt.plot(dnn_final.history.history['val_accuracy'])
plt.title('Final Model Accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['Train', 'Validation'], loc = 'best')
plt.yticks(np.arange(0.0, 1.0, 0.1))
plt.xticks(np.arange(0,model_best_params['n_epochs'],50))
plt.show()
# summarize history for loss
plt.figure(figsize = (15,8))
plt.plot(dnn_final.history.history['loss'])
plt.plot(dnn_final.history.history['val_loss'])
plt.title('Final Model Loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['Train', 'Validation'], loc = 'best')
plt.xticks(np.arange(0,model_best_params['n_epochs'],50))
plt.show()
# Predicting from the test set
pred = m_final.predict(Xc_test)
pred
array([[1.55559480e-01, 8.44209015e-01, 2.88531533e-06, ...,
4.96191096e-05, 5.18644847e-05, 4.78303118e-05],
[2.16083662e-09, 5.93141067e-07, 2.00772961e-03, ...,
9.97991562e-01, 1.65597480e-09, 1.94220817e-08],
[1.78323363e-11, 2.77024115e-11, 1.00000000e+00, ...,
2.91397528e-09, 9.88193832e-11, 2.31493338e-10],
...,
[3.56936314e-10, 2.35589788e-08, 9.22346588e-10, ...,
1.00000000e+00, 1.62329351e-11, 2.30250541e-11],
[1.87696560e-06, 1.14608749e-08, 1.74501675e-08, ...,
8.32656951e-07, 1.59189251e-04, 9.99658465e-01],
[6.07947137e-10, 1.05835418e-09, 9.99998808e-01, ...,
1.10224255e-06, 4.19499713e-10, 1.02411406e-10]], dtype=float32)
# Converting the predicted and ground truth labels from one-hot encoding to integer
pred_label = [np.argmax(i) for i in pred]
yc_test_label = [np.argmax(i) for i in yc_test]
pred_label[10], yc_test_label[10]
(8, 8)
display(Markdown('#### <u>Classification Report'))
print(classification_report(yc_test_label, pred_label))
precision recall f1-score support
0 0.83 0.94 0.88 1814
1 0.87 0.88 0.87 1828
2 0.93 0.87 0.90 1803
3 0.93 0.82 0.87 1719
4 0.90 0.92 0.91 1812
5 0.88 0.90 0.89 1768
6 0.85 0.89 0.87 1832
7 0.87 0.92 0.90 1808
8 0.93 0.84 0.88 1812
9 0.89 0.87 0.88 1804
accuracy 0.88 18000
macro avg 0.89 0.88 0.88 18000
weighted avg 0.89 0.88 0.88 18000
display(Markdown('#### <center><u>Visualizing the confusion matrix</center>'))
plt.figure(figsize = (15, 10))
sns.heatmap(confusion_matrix(yc_test_label, pred_label), annot = True);
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show();
cropped image was better than the other.Categorical_cross_entropy is used as a loss function and accuracy as the performance metric in training and testing the ANN.accuracy of the model.A multi stage strategy was applied in tuning hyper parameters of the ANN.
Stage 1: Architecture Selection - Accuracy of 0.849667.
Stage 2: Coarse tuning - Accuracy of 0.846722.
Stage 3 : Fine tuning - Accuracy of 0.840556.
Satge 4 : Final tuning - Accuracy of 0.890611.